Show language: C# VB.NET Both

Writing a Plug-in (Site Map)

See the overview of the Central Event System in general

The following is intended to be viewed side-by-side with the "Plug-in_SiteMap" demo projects available under the install directory. The purpose of the code is to produce a simple map (list of pages and what they link to) of any web-site as it is being imported (crawled).

Step 1. Create a new project

Create a new class library project (to produce a DLL), it can be named anything that is valid to Visual Studio.

Step 2. Add references

Add references to Keyoti.SearchEngine.Core.DLL, Keyoti.SearchEngine.License.DLL, Keyoti.Text.MSOffice.DLL, Keyoti.Text.LemmaGenerator.DLL (note that you must use the Keyoti2... versions if you use those assemblies in your toolbox - see the notes section 'References To Search DLLs').

Step 3. Add a new class

Create a new class named 'ExternalEventHandler', and make it in a namespace called 'Keyoti.SearchEngine'. Note, that for VB.NET projects, you will need to edit the project properties and delete the 'Root namespace' setting, in order for the code to work as presented.

C#
using System;
using System.Collections;
using System.Text;
using System.IO;
using Keyoti.SearchEngine.Events;
using Keyoti.SearchEngine.Documents;
using Keyoti.SearchEngine.DataAccess;
using Keyoti.SearchEngine;

namespace Keyoti.SearchEngine
{
    /// 
    /// Creates a site-map when a web-site is crawled.
    /// 
    public class ExternalEventHandler
    {

        IEventDispatcher dispatcher;
        Configuration conf;
        public ExternalEventHandler(IEventDispatcher dispatcher, Configuration conf)
        {

        }


        public void DetachHandlers()
        {


        }


    }
}

VB.NET
Imports Keyoti.SearchEngine.Events
Imports Keyoti.SearchEngine.Documents
Imports Keyoti.SearchEngine.DataAccess
Imports Keyoti.SearchEngine
Imports System.Collections
Imports System.IO

Namespace Keyoti.SearchEngine
    ''' 
    ''' Creates a site-map when a web-site is crawled.
    ''' 
    Public Class ExternalEventHandler



        Private dispatcher As IEventDispatcher
        Private conf As Configuration

        Public Sub New(ByRef dispatcher As IEventDispatcher, ByRef conf As Configuration)

        End Sub

        Public Sub DetachHandlers()

        End Sub

    End Class
End Namespace

This is now an empty plug-in, it will compile and the search engine can attach to it, however it will not do anything.

Step 4. Attach/detach event handlers

In order to do something, the class needs to attach event handlers in the constructor, and detach those handlers when asked.

C#
		public ExternalEventHandler(IEventDispatcher dispatcher, Configuration conf)
        {
            Keyoti.SearchEngine.DataAccess.Log.WriteLogEntry("SiteMapper", "Initialized", conf);
            dispatcher.Action += new ActionEventHandler(dispatcher_Action);
            dispatcher.NeedObject += new NeedObjectEventHandler(dispatcher_NeedObject);
            this.dispatcher = dispatcher;
            this.conf = conf;

            ...
        }


        public void DetachHandlers()
        {
            if (dispatcher != null)
            {
                dispatcher.Action -= new ActionEventHandler(dispatcher_Action);
                dispatcher.NeedObject -= new NeedObjectEventHandler(dispatcher_NeedObject);
            }
           ...
        }
VB.NET
		Public Sub New(ByRef dispatcher As IEventDispatcher, ByRef conf As Configuration)
            MyBase.New()
            Log.WriteLogEntry("SiteMapper", "Initialized", conf)
            AddHandler dispatcher.Action, AddressOf Me.dispatcher_Action
            AddHandler dispatcher.NeedObject, AddressOf Me.dispatcher_NeedObject
            Me.dispatcher = dispatcher
            Me.conf = conf
            ...
        End Sub

        Public Sub DetachHandlers()
            If (Not (dispatcher) Is Nothing) Then
                RemoveHandler dispatcher.Action, AddressOf Me.dispatcher_Action
                RemoveHandler dispatcher.NeedObject, AddressOf Me.dispatcher_NeedObject

            End If
			...
        End Sub

The code also writes to a custom log file, named "SiteMapper.txt" (if Logging is enabled in Configuration) and keeps a reference to the configuration (conf) and event dispatcher (dispatcher).

Step 5. Add event handler methods

Event handler methods are added, and the constructor/detach methods are completed with calls to create/close a StreamWriter.

C#
using System;
using System.Collections;
using System.Text;
using System.IO;
using Keyoti.SearchEngine.Events;
using Keyoti.SearchEngine.Documents;
using Keyoti.SearchEngine.DataAccess;
using Keyoti.SearchEngine;

namespace Keyoti.SearchEngine
{
    /// 
    /// Creates a site-map when a web-site is crawled.
    /// 
    public class ExternalEventHandler
    {
        StreamWriter sw;

        IEventDispatcher dispatcher;
        Configuration conf;
        public ExternalEventHandler(IEventDispatcher dispatcher, Configuration conf)
        {
            Keyoti.SearchEngine.DataAccess.Log.WriteLogEntry("SiteMapper", "Initialized", conf);
            dispatcher.Action += new ActionEventHandler(dispatcher_Action);
            dispatcher.NeedObject += new NeedObjectEventHandler(dispatcher_NeedObject);
            this.dispatcher = dispatcher;
            this.conf = conf;

            sw = new StreamWriter(Path.Combine(conf.IndexDirectory, "sitemap.txt"), false);
        }


        public void DetachHandlers()
        {
            if (dispatcher != null)
            {
                dispatcher.Action -= new ActionEventHandler(dispatcher_Action);
                dispatcher.NeedObject -= new NeedObjectEventHandler(dispatcher_NeedObject);
            }
            sw.Close();

        }


        public void dispatcher_Action(object sender, ActionEventArgs e)
        {
            Keyoti.SearchEngine.DataAccess.Log.WriteLogEntry("CustomAssembly", e.ActionData.Name.ToString(), conf);
            if (e.ActionData.Name == ActionName.DocumentBeingCrawled)
            {
                Document document = (e.ActionData.Data as object[])[0] as Document;
                ArrayList links = (e.ActionData.Data as object[])[1] as ArrayList;

                sw.WriteLine("##################################################################");
                sw.WriteLine(document.Uri.AbsoluteUri);
                sw.WriteLine("------------------------------------------------------------------");
                foreach (Uri link in links)
                    sw.WriteLine(link.AbsoluteUri);

                sw.Flush();
            }
        }

        public void dispatcher_NeedObject(object sender, NeedObjectEventArgs e)
        {

        }
    }
}

VB.NET
Imports Keyoti.SearchEngine.Events
Imports Keyoti.SearchEngine.Documents
Imports Keyoti.SearchEngine.DataAccess
Imports Keyoti.SearchEngine
Imports System.Collections
Imports System.IO

Namespace Keyoti.SearchEngine
    ''' 
    ''' Creates a site-map when a web-site is crawled.
    ''' 
    Public Class ExternalEventHandler

        Private sw As StreamWriter

        Private dispatcher As IEventDispatcher

        Private conf As Configuration

        Public Sub New(ByRef dispatcher As IEventDispatcher, ByRef conf As Configuration)
            MyBase.New()
            Log.WriteLogEntry("SiteMapper", "Initialized", conf)
            AddHandler dispatcher.Action, AddressOf Me.dispatcher_Action
            AddHandler dispatcher.NeedObject, AddressOf Me.dispatcher_NeedObject
            Me.dispatcher = dispatcher
            Me.conf = conf
            sw = New StreamWriter(Path.Combine(conf.IndexDirectory, "sitemap.txt"), False)
        End Sub

        Public Sub DetachHandlers()
            If (Not (dispatcher) Is Nothing) Then
                RemoveHandler dispatcher.Action, AddressOf Me.dispatcher_Action
                RemoveHandler dispatcher.NeedObject, AddressOf Me.dispatcher_NeedObject

            End If
            sw.Close()
        End Sub

        Public Sub dispatcher_Action(ByVal sender As Object, ByVal e As ActionEventArgs)
            Log.WriteLogEntry("CustomAssembly", e.ActionData.Name.ToString, conf)
            If (e.ActionData.Name = ActionName.DocumentBeingCrawled) Then
                Dim document As document = CType(CType(e.ActionData.Data, Object())(0), document)
                Dim links As ArrayList = CType(CType(e.ActionData.Data, Object())(1), ArrayList)
                sw.WriteLine("##################################################################")
                sw.WriteLine(document.Uri.AbsoluteUri)
                sw.WriteLine("------------------------------------------------------------------")
                For Each link As Uri In links
                    sw.WriteLine(link.AbsoluteUri)
                Next
                sw.Flush()
            End If
        End Sub

        Public Sub dispatcher_NeedObject(ByVal sender As Object, ByVal e As NeedObjectEventArgs)

        End Sub
    End Class
End Namespace


The site-mapper is only interested in the Action event and in particular actions with name DocumentBeingCrawled (for interest, the example will write all actions received to the log file CustomAssembly.txt if Logging is enabled). When an action named DocumentBeingCrawled occurs, the ActionData.Data object is used to access the document in question and it's links (information about action names and associated data is in the API documentation 'Namespaces'). The document Uri and link list is written to the text file by a StreamWriter.

Step 6. Using the plug-in DLL

Now the project is complete, it can be compiled and used. For development and testing, the easiest way to use the plug-in is to create an Index Directory with a configuration setting linking to the DLL. In the demo project this is already done, there is a folder named IndexDirectory under the project which contains only a configuration.xml file. The Configuration.EventHandlerAssemblyPath property is set to the path of the DLL, relative to the index directory (eg. ..\bin\Plug-in_SiteMap_vb.dll). Since any search process working on this Index Directory will use the plug-in DLL, it can be tested very simply;

  1. Open the Windows based Index Manager Tool - be sure to select the CLR version that matches the references in the plug-in project
  2. Set the Index Directory path to the absolute path of the IndexDirectory folder under the project
  3. Edit the configuration from the tool, notice the EventHandlerAssemblyPath and if desired enable logging, save
  4. Launch the manager and perform an import
  5. The file 'sitemap.txt' should contain information about the web-site imported

Debugging Notes

Due to the nature of DLL loading, there are some handy tips;